AITopics | superhuman performance

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

Neural Information Processing SystemsFeb-14-2026, 10:55:35 GMT

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users'

artificial intelligence, machine learning, pickscore, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

73aacd8b3b05b4b503d58310b523553c-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 22:01:06 GMT

machine learning, natural language, pickscore, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study

Hetz, Martin J., Carl, Nicolas, Haggenmüller, Sarah, Wies, Christoph, Michel, Maurice Stephan, Wessels, Frederik, Brinker, Titus J.

arXiv.org Artificial IntelligenceJun-4-2024

Large Language Models (LLMs) are revolutionizing medical Question-Answering (medQA) through extensive use of medical literature. However, their performance is often hampered by outdated training data and a lack of explainability, which limits clinical applicability. This study aimed to create and assess UroBot, a urology-specialized chatbot, by comparing it with state-of-the-art models and the performance of urologists on urological board questions, ensuring full clinician-verifiability. UroBot was developed using OpenAI's GPT-3.5, GPT-4, and GPT-4o models, employing retrieval-augmented generation (RAG) and the latest 2023 guidelines from the European Association of Urology (EAU). The evaluation included ten runs of 200 European Board of Urology (EBU) In-Service Assessment (ISA) questions, with performance assessed by the mean Rate of Correct Answers (RoCA). UroBot-4o achieved an average RoCA of 88.4%, surpassing GPT-4o by 10.8%, with a score of 77.6%. It was also clinician-verifiable and exhibited the highest run agreement as indicated by Fleiss' Kappa (k = 0.979). By comparison, the average performance of urologists on board questions, as reported in the literature, is 68.7%. UroBot's clinician-verifiable nature and superior accuracy compared to both existing models and urologists on board questions highlight its potential for clinical integration. The study also provides the necessary code and instructions for further development of UroBot.

european association, superhuman performance, urology board question, (12 more...)

arXiv.org Artificial Intelligence

2406.01428

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
North America > United States (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Urology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

Kirstain, Yuval, Polyak, Adam, Singer, Uriel, Matiana, Shahbuland, Penna, Joe, Levy, Omer

arXiv.org Artificial IntelligenceNov-23-2023

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics. Therefore, we recommend using PickScore for evaluating future text-to-image generation models, and using Pick-a-Pic prompts as a more relevant dataset than MS-COCO. Finally, we demonstrate how PickScore can enhance existing text-to-image models via ranking.

dataset, pickscore, text-to-image model, (16 more...)

arXiv.org Artificial Intelligence

2305.01569

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions

Fan, Jiajun

arXiv.org Artificial IntelligenceDec-10-2021

The Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across dozens of Atari 2600 games. ALE offers various challenging problems and has drawn significant attention from the deep reinforcement learning (RL) community. From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE. However, is this the case? In this paper, to explore this problem, we first review the current evaluation metrics in the Atari benchmarks and then reveal that the current evaluation criteria of achieving superhuman performance are inappropriate, which underestimated the human performance relative to what is possible. To handle those problems and promote the development of RL research, we propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency. Furthermore, we summarize the state-of-the-art (SOTA) methods in Atari benchmarks and provide benchmark results over new evaluation metrics based on human world records. We concluded that at least four open challenges hinder RL agents from achieving superhuman performance from those new benchmark results. Finally, we also discuss some promising ways to handle those problems.

algorithm, hwrn, saber, (17 more...)

arXiv.org Artificial Intelligence

2112.04145

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Leisure & Entertainment > Sports (0.94)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

What is Reinforcement Learning and 9 examples of what you can do with it.

#artificialintelligenceOct-29-2020, 16:40:19 GMT

Reinforcement Learning is a subset of machine learning. It enables an agent to learn through the consequences of actions in a specific environment. It can be used to teach a robot new tricks, for example. Reinforcement learning is a behavioral learning model where the algorithm provides data analysis feedback, directing the user to the best result. It differs from other forms of supervised learning because the sample data set does not train the machine.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Can RL from pixels be as efficient as RL from state?

RobohubSep-16-2020, 20:22:27 GMT

A remarkable characteristic of human intelligence is our ability to learn tasks quickly. Most humans can learn reasonably complex skills like tool-use and gameplay within just a few hours, and understand the basics after only a few attempts. This suggests that data-efficient learning may be a meaningful part of developing broader intelligence. On the other hand, Deep Reinforcement Learning (RL) algorithms can achieve superhuman performance on games like Atari, Starcraft, Dota, and Go, but require large amounts of data to get there. Achieving superhuman performance on Dota took over 10,000 human years of gameplay. Unlike simulation, skill acquisition in the real-world is constrained to wall-clock time.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Robohub

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Can RL from pixels be as efficient as RL from state?

AIHubSep-14-2020, 11:00:00 GMT

A remarkable characteristic of human intelligence is our ability to learn tasks quickly. Most humans can learn reasonably complex skills like tool-use and gameplay within just a few hours, and understand the basics after only a few attempts. This suggests that data-efficient learning may be a meaningful part of developing broader intelligence. On the other hand, Deep Reinforcement Learning (RL) algorithms can achieve superhuman performance on games like Atari, Starcraft, Dota, and Go, but require large amounts of data to get there. Achieving superhuman performance on Dota took over 10,000 human years of gameplay. Unlike simulation, skill acquisition in the real-world is constrained to wall-clock time.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

AIHub

Industry: Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Weekly Papers Multi-Label Deep Forest (MLDF); Huawei UK Critiques DeepMind α-Rank

#artificialintelligenceNov-30-2019, 16:01:34 GMT

Close to a thousand machine learning papers are published each and every week. On Fridays, Synced selects seven studies from the last seven days that present topical, innovative or otherwise interesting or important research that we believe may be of special interest to our readers. Author: Liang Yang, Xi-Zhu Wu, Yuan Jiang, Zhi-Hua Zhou from National Key Laboratory for Novel Software Technology, Nanjing University Abstract: In multi-label learning, each instance is associated with multiple labels and the crucial task is how to leverage label correlations in building models. Deep neural network methods usually jointly embed the feature and label information into a latent space to exploit label correlations. However, the success of these methods highly depends on the precise choice of model depth.

algorithm, multi-label deep forest, weekly paper multi-label deep forest, (14 more...)

#artificialintelligence

Country: Asia > China > Jiangsu Province > Nanjing (0.25)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to Best Use AI: Drones or Killer Robots?

#artificialintelligenceNov-5-2019, 13:58:07 GMT

A small group of mujahidin is trekking through the mountains. They carry their Kalashnikov rifles on their shoulders, but they are not especially worried. The nearest enemy unit is several hours away. So high in the mountains, they would see them coming from a long distance. There are other dangers, though.

ai automation, automation, killer robot, (14 more...)

#artificialintelligence

Country: Asia > Afghanistan (0.05)

Industry: